Systematic biases in parameter estimation of binary black-hole mergers 
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Parameter estimation of binary-black-hole merger events in gravitational-wave data relies on 
matched-filtering techniques, which, in turn, depend on accurate model waveforms. Here we charac- 
terize the systematic biases introduced in measuring astrophysical parameters of binary black holes 
by applying the currently most accurate effective-one-body templates to simulated data containing 
non-spinning numerical-relativity waveforms. For advanced ground-based detectors, we find that 
the systematic biases are well within the statistical error for realistic signal-to-noise ratio (SNR). 
These biases grow to be comparable to the statistical errors at high ground-based-instrument SNRs 
(SNR ~ 50), but never dominate the error budget. At the much larger signal-to-noise ratios ex- 
pected for space-based detectors, these biases will become large compared to the statistical errors, 
but for astrophysical black hole mass estimates the absolute biases (of at most a few percent) are 
still fairly small. 

PACS numbers: 



I. INTRODUCTION 

Binary-black-holc (BBH) coalescences are cornerstone 
sources for gravitational-wave (GW) detectors, be they 
existing ground-based detectors like LIGO [1] and 
Virgo [2], planned space-based detectors such as classic 
LISA [3] or eLISA [4], or a pulsar timing array [5]. The 
analysis of GW data to detect and characterize binary- 
black-hole merger events and to test the predictions of 
general relativity requires some family of efficiently com- 
putable signal models, representing all the possible wave- 
forms consistent with general relativity. 

Modeling BBH waveforms has historically been sepa- 
rated into three regimes, distinguished by the different 
computational procedures suitable for each. The "inspi- 
ral" where the individual black holes are sufficiently sep- 
arated for post-Newtonian (PN) theory to be valid [6-9] , 
the "merger" of the binary where numerical relativity 
(NR) is needed [10-12], and the "ringdown" phase of a 
single, post-merger, perturbed object relaxing to a Kerr 
black hole [13, 14]. 

During recent years, work at the interface between 
analytical and numerical relativity has provided the 
community with a variety of semi-analytical inspiral- 
merger-ringdown waveform sets [15-26] of varying scope 
in parameter range and accuracy. These waveforms 
have already been used to search for GWs from high- 
mass [27, 28] and intermediate-mass [29] binary black 
holes in LIGO and Virgo data, and also to carry out pre- 
liminary parameter-estimation studies for ground-based 
detectors [30] and space-based detectors, such as clas- 
sic LISA [31, 32]. In this paper we shall study a 



set of inspiral-merger-ringdown waveforms based on the 
Effective-One-Body (EOB) framework [33-37]. 

Template waveforms will always be an approximation 
to the true signals and the difference, if large enough, 
can bias inferences made from the GW data about the 
astrophysical parameters of the system or the validity 
of general relativity. Estimates of the systematic errors 
introduced by waveform approximants in the literature 
[25, 38-42] have focused only on the inspiral, or used 
general conservative criteria to determine when the wave- 
form has a bias, never using the parameter-estimation 
techniques employed in actual data analysis. 

In this work, we will carry out the first measurement 
of systematic biases introduced when determining the 
physical parameters of a BBH merger by using EOB 
waveforms as templates. We do so by simulating data 
that contain NR waveforms as the "signal" to be de- 
tected. Then, using the Markov Chain Monte Carlo 
(MCMC) method [43, 44], we sample the posterior dis- 
tribution function for the binary parameters using EOB 
waveforms as the templates. The characteristic width 
of the posterior as determined by the MCMC is taken 
as the statistical error, while the distance in parameter 
space between the dominant mode of the posterior and 
the true, or "injected" waveform parameters is the sys- 
tematic error, or bias, introduced by these waveforms. 
We study several different BBH systems, sampling the 
total mass, mass-ratio, and signal-to-noise ratio (SNR) 
space for both a LIGO/ Virgo network in the advanced- 
detector era [45, 46] and a LISA-like configuration. 

For this first analysis we employ non-spinning wave- 
forms for quasi-circular orbits. In particular, for the in- 
jected signals, we consider the NR waveforms produced 
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by the Caltech- Cornell- CITA collaboration in Ref. [47]. 
For the templates we use the EOB waveforms that were 
calibrated in Ref. [25] to those NR waveforms [47]. Be- 
cause our emphasis is on BBHs, and the merger wave- 
forms in particular, it is certainly the case that spin 
magnitude and orientation play an important role in 
the waveform, and thus in parameter estimation [48-52] . 
NR waveforms with spins aligned or anti-aligned with 
the orbital angular momentum are available, and EOB 
waveforms that include spins have been developed in 
Refs. [22, 26, 37]. However, the spinning EOB waveforms 
are currently restricted to the dominant mode and addi- 
tional code development is needed before they can be em- 
ployed in stochastic sampling methods like the MCMC. 
Thus, we leave to the future the extension of this study 
to spinning BBHs. 

Within these limitations, we show that the EOB wave- 
forms developed in Ref. [25] and tested here are accu- 
rate enough to introduce little to no significant biases 
when the data contain NR waveforms at SNRs con- 
sistent with expectations for likely LIGO/Virgo detec- 
tions (SNR < 50). For LISA-like detections, where 
the expected SNRs are much higher than for ground- 
based detectors, statistically significant biases do emerge. 
Nonetheless, we find that the discrepancies between the 
true and measured parameters, at a few percent for the 
black-hole masses, are small enough to not impact key as- 
trophysical conclusions that may be drawn from the data 
(e.g., black-hole seed models, etc.) [4, 53, 54]. However, 
when very high accuracies are required, as when testing 
the validity of general relativity [55, 56], best- fit EOB 
waveforms from the existing model will leave behind sig- 
nificant residual power, making them ill suited for these 
applications without further development. 

The remainder of the paper is organized as follows. 
In Sec. H we describe the numerical and analytic wave- 
forms used in this work. In Sec. Ill we lay out how the 
study will proceed, describing in particular the MCMC 
sampler that we use. We then discuss in detail the re- 
sults for stellar-mass BBHs in ground-based detectors 
(Sec. IV) and supermassive BBHs in space-based obser- 
vatories (Sec. V). In Sec. VI we summarize the findings 
from this work, address limitations, and discuss future 
directions to be pursued. 



II. INSPIRAL-MERGER-RINGDOWN 
WAVEFORMS USED IN THE ANALYSIS 

Our study involves comparisons between two sets of 
waveforms. We primarily seek to evaluate a continuously 
parameterizable family of model waveforms based on the 
EOB framework against a discrete set of highly accurate 
numerical relativity (NR) waveforms. In analogy with 
observational algorithm tests, we can think of the numer- 
ical waveforms as "injected" signals, which we challenge 
the "template" EOB waveforms to match. 

We employ as injected signals the non-spinning NR 



waveforms produced by the Caltech-Cornell-CITA col- 
laboration [47], using the spectral Einstein code. The 
NR polarizations have mass-ratio q = 1711/1112 = 
1,2,3,4,6 and contain -2 spin-weighted spherical har- 
monies (£,m) = (2, ±2), (2,±1), (2,0), (3, ±3), (3, ±2), 
(4, ±4), (5, ±5), and (6, ±6). These waveforms provide 
30-40 (quadrupole) GW cycles before merger, depend- 
ing on the mass-ratio. 

The phase and amplitude errors of the NR waveforms 
vary with mass-ratio and gravitational mode. The nu- 
merical errors grow toward merger and ringdown, and 
typically at merger, for the dominant (2, 2) mode, the 
phase error ranges between 0.05 and 0.25 rad, while the 
fractional amplitude error is at most 1%. The subdomi- 
nant modes can have somewhat larger errors, especially 
the (3,3) and (4,4) modes. 

Applying the numerical waveforms to generate mock 
signal observations, we will test the ability of a previously 
published family of template waveforms to characterize 
these signals [25]. These template waveforms are based 
on the EOB framework, founded on the very accurate re- 
sults of FN theory, an expansion of general-relativity dy- 
namics in powers of v/c, where v is the characteristic ve- 
locity of the binary. In the EOB approach, however, the 
FN expansions are applied in a resummed form that maps 
the dynamics of two compact objects into the dynam- 
ics of a reduced-mass test particle moving in a deformed 
Schwarzschild geometry [33-37] . Waveforms in the EOB 
formalism are derived from such particle dynamics up to 
the light-ring (unstable photon orbit) radius. The subse- 
quent ringdown portion of the waveforms is a superposi- 
tion of quasinormal modes matched continuously to the 
inspiral. Tunable parameters, effectively standing in for 
currently unknown higher-order FN terms, are fixed by 
matching to numerical-relativity simulation results. 

The comparable-mass NR waveforms in Ref. [47] were 
used, together with the small-mass-ratio waveforms pro- 
duced by the Tcukolsky code in Ref. [57], to cali- 
brate a non-spinning EOB model in Ref. [25]. More 
specifically, the numerical waveforms available at dis- 
crete points in the parameter space were employed 
to fix a handful of EOB adjustable parameters enter- 
ing the EOB conservative dynamics and gravitational 
modes. These adjustable parameters were then interpo- 
lated over the entire mass-ratio space. The EOB model in 
Ref. [25] contains four subdominant gravitational modes, 
(2, ±1), (3, ±3), (4, ±4) and (5, ±5), beyond the dominant 
mode (2, ±2). 

The EOB model in Ref. [25] has been coded in the 
(public) LIGO Algorithm Library (LAL) [58] (under the 
name E0BNRv2). We carry out our study using LAL 
to generate template waveforms. Henceforth, we denote 
the EOB model with only the dominant (2, ±2) mode 
as EOB22, and the model that includes the four sub- 
dominant modes (2, ±1), (3, ±3), (4, ±4) and (5, ±5) as 
EOBhh- We will also omit the ± in mode labels. 

The phase difference of the (2, 2) mode between the 
calibrated EOB model and numerical simulation remains 
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below ~ 0.1 rad throughout the evolution for all mass- 
ratios considered; the fractional amplitude difference at 
merger (i.e., at the waveform's peak) of the (2,2) mode 
is 2%, growing to 12% during the ringdown. Around 
merger and ringdown, the phase and amplitude differ- 
ences of the subdominant modes between the EOB and 
NR waveforms are somewhat larger than those of the 
(2, 2) mode. [The numerical errors, and phase and am- 
plitude differences between the EOB and NR waveforms 
can be read off from Figs. 6-10 in Ref. [25].] 

To quantify how these differences between template 
and signal would affect GW searches in Advanced LIGO, 
Ref. [25] studied the effectualness and measurement ac- 
curacy of the EOB model. When investigating the ef- 
fectualness for detection purposes, they found that the 
NR polarizations containing the strongest seven modes ^ 
have a maximum mismatch of 7% for stellar-mass BBHs, 
and 10% for intermediate-mass BBHs, when using only 
EOB22 for q = 1,2, 3, 4, 6 and binary total masses 20-200 
Mq. However, the mismatches decrease when using the 
full EOBhh model, reaching an upper bound of 0.5% 
for stellar-mass BBHs, and 0.8% for intermediate-mass 
BBHs. Thus, the EOB model developed in Ref. [25] is 
accurate enough for detection, which generally requires 
a mismatch not larger than 7%. 

To understand whether this EOB model is pre- 
cise enough for measurement purposes, the authors of 
Ref. [25] carried out a preliminary study, adopting as ac- 
curacy requirement for measurement the one proposed 
in Refs. [41, 42]. Using a single Advanced LIGO detec- 
tor, Ref. [25] computed the SNRs below which the EOB 
polarizations are accurate enough that systematic biases 
are smaller than statistical errors. Since subdominant 
modes have non- negligible contribution for large mass- 
ratios, and those modes have the largest amplitude er- 
rors, they found that the upper-bound SNRs are lower 
for the most asymmetric systems, such as q = 6. How- 
ever, as stressed in Ref. [25], the accuracy requirement in 
Ref. [41, 42] may be too conservative, and by itself does 
not say which of the binary parameters will be biased 
and how large the bias will be. It could turn out that the 
biased parameters have little relevance in astrophysics or 
tests of general relativity. It is the main goal of this pa- 
per to measure the actual biases of the EOB model with 
and without the subdominant modes. 

Our study is restricted to binary systems moving along 
quasi-circular orbits where the spin of each constituent 
black hole is negligible, thus reducing the model param- 
eters from a space of 17 dimensions to 9 dimensions: 



6 = {lnM,\nM,\nDi,,tp,sinS,a,cosL,^p,ipp} . (1) 

In the above parameter list, il/ = (1 + z) (mi + 777,2) is the 
redshifted total mass of the binary, and M = v^^^ M is 



its chirp mass, where v = mxm^jirnx +7x1-2)'^ = qjiX + qf' 
is the symmetric mass-ratio. We denote by the lumi- 
nosity distance, which, along with the right ascension 
a and declination b, describes the location of the bi- 
nary. The orientation of the binary's orbital angular- 
momentum vector L with respect to the line of sight k 
from the observer is encoded in the model using the in- 
clination L, polarization angle -0, and phase ip^ — the 
Euler angles that describe the rotation from k to L. The 
parameter tp is the time of the (2, 2) mode's maximum 
amplitude, a proxy for the binary merger time, and 
is the GW phase at ip. 

Our comparisons between the EOB waveforms and 
the NR data are restricted to the late-inspiral, merger 
and ringdown, that is roughly 30-40 GW cycles before 
merger, depending on the mass-ratio. The injected sig- 
nals contain only the NR waveforms available. We do not 
match the NR waveforms to EOB or PN waveforms at 
low frequency to increase the number of cycles, because 
we do not know how well the EOB or PN waveforms 
would approximate the NR waveforms outside the region 
of calibration and we do not want to introduce unknown 
errors when estimating the systematic biases. 

There is no guarantee that a template that is in phase 
with the NR waveform during the last 30-40 GW cy- 
cles will remain so throughout the entire inspiral. The 
mass parameters are strongly encoded in the GW phase, 
so any additional de-phasing at times earlier than cov- 
ered by the NR simulations can potentially increase the 
systematic biases. Therefore, in order for our study to 
be meaningful, we consider binary systems with a total 
mass M such that the majority of the SNR is accumu- 
lated during the last 30-40 GW cycles before merger. 



III. STATISTICAL VERSUS SYSTEMATIC 
ERRORS USING MCMC TECHNIQUES 

We use the MCMC algorithm [43, 44] to produce sam- 
ples from the posterior distribution function for the wave- 
form parameters. The MCMC sampler is built on the 
foundation of Bayes' Theorem, which, in the context of 
parameter inference, defines the posterior distribution 
function for parameter vector d and data d as 



v{d\X) 



(2) 



^ Note that in Ref. [25] the (2, 0) mode was not included. 



Here p(-|-) are conditional probability densities with ar- 
guments on the right-hand side of the bar assumed to 
be true, 'p{d\B,X) is the likelihood, j)(Q\L) is the prior 
distribution, and ]3(d\X) is the model evidence, which, in 
parameter-estimation applications, serves only as a nor- 
malization constant. The information I denotes all of 
the assumptions that are built into the analysis, par- 
ticularly that the NR waveforms represent reality (see 
Sec. II for associated discussion). Henceforth, to sim- 
plify notation, we shall not include X when writing the 
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conditional probability density When comparing 

different model combinations, we adopt the notation for 
conditional probabilities with arguments to the right of 
a vertical bar representing the data and arguments to 
the left signifying which model was used as templates. 
For example, results labeled p(EOB22|NR) come from the 
posterior distribution functions for models using EOB22 
as the templates and an NR waveform as the data. 

With the posteriors, we compare the statistical error 
(the characteristic width of the posterior) to the system- 
atic error (the displacement from the injected parameter 
values of the posterior's mode). We do not include a noise 
realization in the simulated data, as that introduces addi- 
tional biases — each noise realization pushing the best-fit 
solution away from the injected value in a different way 
— that are not easily quantified [59]. As the control 
in this experiment, we simulated (noise-free) data with 
EOB waveforms and use EOB templates for parameter 
estimation, giving us one set of results with no systematic 
bias, apart from the sampling error in the Markov chains 
due to their finite length. We use these controlled results 
as code verification, and as the standard against which 
the other models' performance is compared. We then 
test the EOB waveform models by injecting NR wave- 
forms (summed over all available modes) and using the 
two EOB models discussed in Sec. II as templates. The 
EOB22 model is used as a baseline, as it has been em- 
ployed in LIGO/Virgo search pipelines to analyze data 
collected in recent science runs [29, 60]. The EOBhh 
model is the most complete waveform at our disposal, 
and is used to measure how well the EOB model could 
perform on NR data. 

In Eq. (2) we use the standard gaussian logarithmic 
likelihood \np{d\e) ee ~{d~ h{e)\d ~ h{e)) /2 + C, where 
C is a normalization constant that does not depend on 
model parameters and is henceforth neglected, h is the 
template, and 

m) 

denotes a noise- weighted inner product with the sum on i 
over / (independent) interferometer channels and <S'^(/) 
is the one-sided noise power spectral density (PSD) for 
detector i. The bounds of integration, /mm & /Nyq, are 
the minimum frequency of the NR waveform and the 
Nyquist frequency of the data, respectively. The Nyquist 
frequency is chosen to ensure that the highest-frequency 
portion of the waveform is well below the instrument sen- 
sitivity curve, while /min is set by the duration of the NR 
waveform and the total mass of the system, such that we 
only integrate over frequencies where numerical data ex- 
ists. 

For each case, the Markov chains are run for ^ 10^ it- 
erations, taking about lO'^ CPU hours to complete. The 
chains rely on parallel tempering [61], differential evo- 
lution [62], and jumps along eigenvectors of the Fisher 
Information Matrix (FIM) (e.g., see Ref. [63]) computed 



from PN waveforms to efficiently explore the posterior 
distribution function. We use burn-in times of 10^ sam- 
ples, and run several chains with different initial loca- 
tions to check for convergence. Prior distributions for all 
parameters are chosen to be uniform. Azimuthal angu- 
lar parameters {a, -0, and ipp) have support over [0, 2n) 
with periodic boundary conditions, while declination-like 
angle parameters (sini5, cost) range from [—1, 1] with re- 
ficcting boundary conditions. The ranges for In M and 
IuDl are chosen to be large enough so as to not infiu- 
ence the posteriors. The prior range on InA^ is coupled 
to InM, as the maximum value of the chirp mass oc- 
curs for the q = 1 {ly = 1/4) case, and depends on the 
total mass M of the system. Because of this, the prior 
boundary on chirp mass docs affect the posteriors for the 
equal-mass systems considered in this work. 

The products of our analysis procedure are samples 
from the posterior distribution function p{6\d) — an 
oddly shaped, sometimes multimodal, blob living in a 9D 
space. There is no perfect way of distilling this informa- 
tion into a simple, robust, statistic to assess parameter- 
estimation accuracy. We will make do with the "frac- 
tional systematic error" 6(5g: For parameter 9, we first 
define the systematic error I3g = J^map — Oo\ where ^map 
is the maximum a posteriori (MAP) value and 9o is the 
injected value, while the statistical error is quantified by 
the standard deviation ae of the ID marginalized poste- 
rior distribution function. We then define the fractional 
systematic error as the ratio between /3e and the statis- 
tical error: 



S^e = —. (4) 



We consider templates that consistently yield 5 /3g <1 a.s 
introducing negligible bias, assuming the NR waveforms 
are exact, which, as seen in Sec. II, is not the case ^. 

The fractional systematic error (4) can be interpreted 
as the number of standard deviations away from the in- 
jected value at which we find the MAP waveform. This 
choice of statistic is not perfect — low-SNR systems have 
very non-gaussian posteriors making the standard devia- 
tion a poor choice for characterizing the statistical error. 
Furthermore, the MAP parameters are a single point and 
tell us nothing about how large a region in parameter 
space had similar posterior support to the current best 
estimate. Additionally, the MAP value is a feature of the 
full 9D posterior, while the variances are computed from 
the marginalized posterior distribution functions. This 
introduces complications for some special cases, as we 
shall discuss in detail below. 



^ Currently we have no way of incorporating the numerical error 
of the NR waveforms into our estimate and so we neglect it. 
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IV. RESULTS FOR ADVANCED LIGO 
DETECTORS 

The first test of the EOB waveforms uses simulated 
data from the network of advanced ground-based detec- 
tors expected to come on line in the middle of this decade: 
the two LIGO detectors in the USA and the Virgo de- 
tector in Italy. We use the same noise PSD for each in- 
terferometer, the "zero-detuned high-power" curve from 
Ref. [64], which is the sensitivity curve for the fully com- 
pleted Advanced LIGO detector. The GW response in 
each interferometer is modeled by convolving the GW 
signal with the beam-pattern function for that detector 
and applying the appropriate time-delays between inter- 
ferometers [65]. 



A. Choice of binary configurations 

We study several binary configurations using different 
mass-ratios, total masses, and S NRs. The SNR of the 
system is computed via SNR = y^[d\d), and its value is 
controlled by adjusting the luminosity distance D^. Be- 
cause our simulated data d contain no simulated noise, 
the SNR is simply the inner product of the injected wave- 
form with itself. Also, our definition of the inner prod- 
uct in Eq. (3) includes a summation over all interferom- 
eter channels, thus we quote the network SNRs for the 
ground-based studies. 

The first three panels of Fig. 1 show the time-domain 
EOBhh waveforms (blue, dotted) and whitened by the 
noise spectral density (red, solid) for three representa- 
tive cases studied here'^. The vertical lines indicate inter- 
vals in which contribute 10% of the signal power, start- 
ing from /mill for each system, with the rightmost line 
indicating where 99.9% of the power has accumulated. 
We focus on the latc-inspiral, merger, and ringdown por- 
tions of the waveform, beginning 1000 M before the peak 
of the (2, 2) mode's amplitude. The power intervals are 
included as a guide to see which portions of the wave- 
form contribute most to the parameter estimation. For 
instance, 50% of the q = 1, M = 50 Mq waveform's in- 
tegrated power is contained in the late-inspiral, merger, 
and ringdown, while the q = 2, M = 23 A/q signal is 
much less dominated by this interval, accounting for only 
30% of the power. The q = 6, M = 120 Mq examples 
are most influenced by the end of the waveform, which 
makes up over 70% of the power. 

The bottom-right panel of Fig. 1 shows the strain spec- 
tral densities |ft.(/)| for the same three systems, now using 
the NR waveforms used in this study. Also included is 
the Advanced LIGO power spectral density. 



"Whitened" waveforms are ones that have been Fourier- 
transformed to the frequency domain, rescaled by 1/ 1/ Sn (/), 
and finally re-transformed to the time domain [66]. 



We focus on moderately high-mass black-hole merg- 
ers with M ^ 50 AIq (e.g., the equal- mass case shown 
in Fig. 1: top- left panel, red solid curve in bottom- 
right panel). Beyond their potential as Advanced LIGO 
sources^, high- mass systems serve an important role in 
testing the waveform models for two reasons. First, as 
explained in Sec. II, the NR signals are short in duration 
and we do not supplement the waveform by hybridiz- 
ing the numerical data with analytic inspiral models at 
low frequency. We therefore require higher-mass sys- 
tems, merging at lower frequency, to ensure that most 
of the inspiral missing from the NR data will fall outside 
the sensitive measurement band of the detector. With 
M ^ 50 Mq, NR waveforms start at ~ 30 Hz, setting 
/mill in the inner product defined in Eq. (3). Comparing 
these data to EOB waveforms with the same parameters 
but /min = 10 Hz (below which the Advanced LIGO sen- 
sitivity is very poor) , we find the NR waveforms contain 
^ 85% of the total signal power, or ^ 90% of the total 
SNR. 

A second reason for focusing on M ^ 50 Mq for the 
binary is that these systems are "centrally located" in 
frequency over the most sensitive band of the advanced 
detectors (~ 30 to ~ 10'^ Hz, e.g., see the bottom-right 
panel in Fig. 1), such that inspiral, merger, ringdown, 
and additional modes all contribute to the overall signal 
power and, accordingly, the parameter-estimation capa- 
bilities. Generally speaking, we expect such systems to 
make the greatest demands on complete inspiral-merger- 
ringdown waveform model accuracy. 

While the M ^ 50 Mq systems serve as the basis 
for our comparisons, we include additional examples to 
probe regions of signal space that are of particular in- 
terest. These include a g = 6, I2OM0 system (Fig. 1: 
bottom-left panel, blue dotted curve in bottom-right 
panel) chosen such that the subdominant modes con- 
tribute the most, as they will be most pronounced at high 
mass-ratios, and signal power from the higher frequency 
modes is still in the sensitive band of the detector. At 
this mass, /min = 10 Hz, so our analysis is not missing 
any signal power due to the length of the NR data. 

We also go to lower masses, using a g = 2, M = 23 Mq 
binary (Fig. 1: top-right panel, green dashed curve in 
bottom- right panel) as a more likely LIGO/Virgo de- 
tection, to demonstrate the EOB models' parameter- 



* A binary with M ~ 50Mq is astrophysically relevant, as 
the largest black-hole mass ever observed is in the range of 
23-34 Mq [67, 68]. Recent results from population-synthesis 
studies suggest that massive, low-metallicity stars are capable 
of producing black holes as large as M ~ 80 Mq [69] , although 
these findings are for single stars only, and binary evolution could 
either increase or decrease the maximum black-hole mass. Addi- 
tionally, there exists at least one example of a massive Wolf-Rayet 
star, R136al [70], with M ~ 25OM0 at a distance of ~ 0.1 Mpc 
and with sufficiently low metallicity to produce a massive black 
hole. However, it cannot be excluded that the star goes instead 
through a pair-instability supernova, leaving no remnant. 
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q = 6, M = 120 
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FIG. 1: [Top row, and bottom-left] Time-domain BOB waveforms (blue dotted) and the same signal whitened by the noise 
spectral density (red solid) representing different test cases studied in this work. The time-axis is scaled by the total mass M 
and shifted by the merger time tp. The horizontal dotted lines demarcate 10% intervals for accumulated signal power, with the 
rightmost line at the 99.9% mark. [Bottom right] Strain spectral densities |/i(/)| showing the estimate for the Advanced LIGO 
noise curve (gray dotted line) and the NR waveforms for the same representative examples as the other panels. 



estimation accuracy not at the extremes of a potential 
binary signal, but within reasonable expectations of what 
the coming data may hold (apart from including the 
black-hole spins). It is worth noting that for these low- 
mass systems, we are missing a large portion of the in- 
spiral, as /,nin = 60 Hz and - 30% of the full SNR will 
be accumulated below that frequency. Therefore, these 
results might change in the future, when longer EOB and 
NR waveforms become available. 



B. Results on systematic biases at fixed inclination 

angle 

In Fig. 2 we plot the ID marginalized posterior dis- 
tribution functions for each parameter for the case of a 
binary with mass-ratio q = 2, total mass M = 51 Mq, 
and network SNR = 48, produced using an MCMC sam- 
pler. The inclination angle is chosen to be t = 7r/3. 



The independent variables in these plots arc ~ 
6 — ^inj, where Oinj are the injected parameter values. 
The p(EOB22|EOB22) histograms (green, dashed lines) 
are the posteriors using the EOB22 waveforms for both 
the signal and the templates. These confirm that the 
MCMC sampler is working properly, as the posteriors all 
show strong support for the injected waveform param- 
eters (peaking at or near 0), and statistical errors con- 
sistent with results in Ref. [30] obtained using FIM es- 
timates and phenomenological inspiral-merger-ringdown 
waveforms. 

The red (solid) lines and blue (dotted) lines are for 
data containing an NR signal and the EOB22 and EOBhh 
waveforms as templates, respectively. The bottom-right 
panel shows the logarithmic likelihood distributions for 
each chain. We can see from these posteriors that the 
EOB22 waveform is significantly biased away from the 
NR injected value, by 1% in both M and M. This 
bias is substantially reduced when using the EOBhh tem- 
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FIG. 2: Example marginalized posterior distribution functions for a binary system with mass-ratio q = 2, total mass M — 51Mq, 
and network SNR — 48 produced using an MCMC sampler. The green (dashed) histograms arise from injecting the EOB22 
waveforms and using the same model as the template. The red (solid) distributions use the EOB22 waveform while the blue 
(dotted) curves use the EOBhh waveform to recover an NR injection. The x-axis shows the distance away from the injected 
parameter value, so distributions that peak at zero show no bias. The bottom-right panel shows the logarithmic likelihood 
distributions, which are used to quantify the amount of residual power left behind by the waveform model. The y-axes have 
be normalized to range between [0, 1]. 



plate, to the point where the systematic error is well 
within the statistical error of the posterior. For the ex- 
trinsic parameters such as distance and sky location, the 
posteriors for the approximate templates are nearly iden- 
tical to those produced by using the exact same waveform 
for both data simulation and parameter estimation. Wc 
also see the role that subdominant modes play in break- 
ing the 7r/2 degeneracy in the polarization angle "0, which 
can aid in distance and sky-location determination for 
some systems. 

The SNR o f the residual d - h, given by SNRrcs = 
Y^— 2 lnp{d\d) can be inferred from the bottom-right 
panel. Viewed in this way, there is a distinct excess in 
the residual for even the EOBhh case (blue, dotted) in 
comparison with the idealized control MCMC residuals 
(green, dashed). To understand the significance of this, 
consider applying a detection threshold of SNR = 6 [71] 
for the residual waveform. This corresponds to a maxi- 



mum logarithmic likelihood of < —18, below which the 
residual could potentially contain enough power to be de- 
tected after the best-fit waveform is regressed from the 
data. Suppose in the near future we have model wave- 
forms at our disposal containing all of the details of black- 
hole mergers (i.e. spins and eccentricity) that generally 
produce SNRros ^ 6 for equally detailed NR simulations, 
and yet coherent residuals are consistently found in the 
data. Such an event could suggest a possible departure 
from general relativity. 

The results from our MCMC studies for different sys- 
tems are displayed in Tables I and II, which show the frac- 
tional systematic error 3/3 (see Eq. (4)) for the mass, sky 
location, and distance parameters. The injected wave- 
forms again had an inclination angle of t = 7r/3. We in- 
clude our estimate of the statistical error ag to quantify 
the precision of the advanced detectors for the systems 
considered here. The standard deviations should be in- 



8 













p(EOBlEOB) 






p(EOB22|NR) 






p(EOBhh 


NR) 




g 


M (Mo) 


/low (Hz) 


SNR 


0"ln M 


fJin 


(5/3ln M 




O" In M 


(Jin A4 


S /3ln AI 




O" In M 




SPln M 




1 


50 


30 


12 


0.02 


0.02 


0.11 


0.05 


0.02 


0.02 


0.34 


0.17 


0.02 


0.02 


0.29 


0.10 


1 


50 


30 


48 


4 X 10"^ 


3 X 10"^ 


1 X 10"^ 


0.14 


0.01 


0.01 


1.76 


0.84 


2 X 10" 


^ 2 X 10"^ 


0.79 


0.87 


2* 


23 


60 


12 


0.02 


0.01 


0.01 


0.02 


0.03 


0.02 


0.27 


0.07 


0.03 


0.02 


0.18 


0.26 


2 


51 


30 


12 


0.03 


0.02 


3 X 10"^ 


4 X 10"^ 


0.03 


0.02 


0.47 


0.28 


0.03 


0.02 


0.01 


0.01 


2 


51 


30 


48 


0.01 


0.01 


0.01 


0.01 


0.01 


0.01 


1.92 


1.39 


0.01 


0.01 


0.93 


0.32 


6 


56 


30 


12 


0.03 


0.03 


0.03 


0.08 


0.03 


0.03 


0.94 


0.66 


0.02 


0.02 


0.58 


0.39 


6 


56 


30 


48 


0.01 


5 X 10"=* 


0.23 


0.04 


0.01 


0.01 


3.47 


2.51 


0.01 


4 X 10"^ 


1.43 


0.84 


6* 


120 


10 


12 


0.03 


0.03 


0.05 


0.14 


0.03 


0.03 


1.67 


0.60 


0.02 


0.02 


0.29 


0.06 



TABLE I: Fractional systematic biases 6/3 [see Eq. (4)] and statistical errors a for intrinsic parameters as determined by the 
MCMC sampler. An asterisk in the mass-ratio column indicates examples where EOBhh was used for the p(EOB|EOB) study. 
All other examples used the EOB22 waveform. 



terpreted with caution; wo do not inchide noise in the 
simulated data, so the deviations are not representative 
of the "error bars" on a particular detection, but instead 
represent an ensemble average over idealized Gaussian 
stationary noise realizations of the statistical error for 
these particular systems. 

Table I contains the intrinsic parameters — those that 
affect the shape of the waveform. Because we consider 
non-spinning black holes, the only intrinsic parameters 
are the masses. The extrinsic^ or observer-dependent, 
parameters (i.e., distance and sky location) are given in 
Table IL They are encoded in the instrument response 
to the GW, instead of being imprinted in the phase and 
amplitude evolution of the waveform itself. We do not 
report on the orientation parameters l and ip or reference 
time tp and phase tpp parameters in this fashion, but 
note that results for these other parameters arc consistent 
with the extrinsic variables in Table IL 

From Table I we can see that generically, the EOB22 
waveforms are not as accurate as the EOBhh waveforms, 
which include the subdominant modes. This is true even 
for comparable-mass systems, where the subdominant 
modes only minimally contribute to the overall waveform 
power. The bias introduced by neglecting additional har- 
monics is not due to missing waveform power as much as 
it is caused by phase differences between a quadrupole- 
only template and the full NR data, as coherent matched- 
filtering analyses are typically more sensitive to phase 
than amplitude. 

The parameter estimation accuracy of the EOBhh 
model up to SNR 50 exceeds expectations from 
Rcf. [25], as can be seen by focusing on rows 2 and 7 
in Table I. Here we find systems chosen specifically to 
compare with Fig. 15 in Ref. [25] where, based on the 
accuracy requirement proposed in Refs. [41, 42], they 
predicted that systematic error could exceed statistical 
error at single-detector SNR ~ 35 (g = 1, M = 50Mq) 
and - 11 (q = 6, AI = 56Mo). ^ 



^ The accuracy criterion used in Refs. [25, 41, 42] is a "sufficient" 



Our analysis uses the LIGO/Virgo network of de- 
tectors, as opposed to the single-detector studies from 
Ref. [25]. This difference will not heavily impact the 
results, as it is the measurement of intrinsic parameters 
that is most affected by differences in the waveform model 
due to both the accuracy with which they are measured, 
and the way they are encoded in the phase evolution of 
the signal. Measurement of the intrinsic parameters is 
not greatly influenced by the inclusion of additional de- 
tectors in the network (at a fixed SNR). Our findings 
show that, even at SNRs that are rather high for an ex- 
pected LIGO detection, the EOBhh model introduces 
systematic errors that differ by < Ict from the injected 
parameters. 

The extrinsic parameters, on the other hand, are in- 
ferred mostly from the overall amplitude of the wave- 
form, which is not as well-measured as phase, and the 
time of arrival of the signal at each detector. We thus 
expect that the extrinsic parameters, determined with 
lower fidelity than the masses, will be better able to toler- 
ate small differences between template waveform models 
within the statistical error. Adding additional detectors 
to the network dramatically improves the statistical er- 
ror for extrinsic parameters, mostly due to the increased 
baseline [72] , but not to the point of becoming influenced 
by the waveform systematics. 

Indeed, we find that the relative systematic biases for 
extrinsic parameters are generally smaller that those of 
the intrinsic parameters. For systems with q > 2, re- 
gardless of the SNR or the EOB model, the systematic 
errors are consistently smaller than the statistical errors, 
even when the (2, 2)-only waveform is used as the tem- 
plate. This is evident in Fig. 2, where the Dl, sin 5, and 
a posteriors are nearly indistinguishable, despite the sig- 
nificant difference in the residual left behind by the wave- 
form model, as shown in the bottom-right panel contain- 



but "not necessary" requirement for parameter estimation, and 
it does not say which of the binary parameters will be biased and 
how large the bias will be. Thus, the authors of Ref. [25] were 
making conservative judgments about the waveform accuracy. 
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TABLE II: Same as Table I, except here we show a subset of the extrinsic parameters corresponding to the binary's location. 
Because of their similarity between each run, the statistical errors are displayed once but apply to each example. Results for 
the q = 1, SNR = 12 example are omitted due to the failure of our Sp statistic. 



ing the logarithmic likelihood distributions. The same 
can not be said for the equal-mass cases (top two rows 
in Table II), where we encounter a subtle effect from our 
choice of statistic, 5/3. 



For the SNR = 12, equal-mass case, the 5/3 statistic 
breaks down and results are omitted from the table. At 
such a low signal strength, the orientation parameters 
are very poorly measured, with the polarization angle tp 
effectively unconstrained. These large measurement un- 
certainties cascade through the ID posteriors via strong 
Tp — i and L — Dl covariances. We are left with an un- 
constrained distribution that is poorly characterized 
by the variance, and large stochastic variation from one 
Markov chain run to the next as to where the MAP 
parameters lie. This degeneracy is evident in Fig. 3, 
where we show the 2D marginalized posterior distribu- 
tion function of the ip — cos l plane (left panel) from a 
p(EOBhh|NR) run, and the maximum logarithmic like- 
lihood found in the Markov chain for different bins in 
space (right panel) from both p(EOB22|EOB22) and 
p(EOBhh |NR). We see a virtually flat distribution of the 
maximum logarithmic- likelihood values between '--^ 0.5 to 
~ 2.25 Gpc, with well over half of the allowed parameter 
space in the ip — cos l plane receiving significant posterior 
support. The injected value of Dl was near 1 Gpc. 



The over-density at {V',cost} ~ {tt/S, 0.5} in the left 
panel corresponds to the injected parameter values, with 
the 7r/2 — shift degenerate mode still appearing despite 
the inclusion of subdominant modes. Recall that this 
is an equal-mass system, where the subdominant modes 
are the least noticeable. The over-density at cos /, 1 is 
due to the Markov chain preferring template waveforms 
with minimal contribution from subdominant modes (to 
match the strictly equal-mass injection) as the sampler 
explores higher mass-ratios, up to g ^ 3 in this case. Sys- 
tems with cos t = k ■ L = 1 are face-on and it is this con- 
figuration where the subdominant modes are least promi- 
nent. 



C. Dependence of the results on the inclination 
angle 



Due to the high computational cost of each MCMC 
run, we are not able to Monte Carlo over a large popula- 
tion of binary systems. We instead have chosen extrinsic 
parameters away from the extremes of parameter space. 
This means sky locations that are away from nulls in any 
detector's response, and inclinations (t = 7r/3) that were 
not edge- or face-on with respect to the observer's line of 
sight. 

One of the more interesting results from this study is 
the impact of the subdominant modes on the parameter- 
estimation capabilities of ground-based detectors. The 
role that the additional modes play in the waveform de- 
pends heavily on both the mass-ratio and the orienta- 
tion of the binary — edge-on systems have the largest 
contribution from the additional modes, while face-on 
systems are most dominated by the (2,2) mode. It is 
therefore possible that, for some more extreme orienta- 
tions, systematic biases could become large because of 
the increased importance of the additional modes. 

To allay this concern we performed a series of MCMC 
runs on a system where the subdominant modes would 
play an important role, exploring the edges of orientation 
space for each run. We chose the M = 120Mq, g = 6 
system (row 8 in Tables I and II) and analyzed three 
different orientations: edge-on [l = 7r/2), face-on (t = 0), 
and moderate tilt (t = 7r/3). We compare the AlnM 
and A In 7W posteriors for each of these systems using the 
EOBhh model as a template to study data containing 
an NR waveform injected at SNR= 12. The results in 
Table III show the fractional systematic error well below 
unity for each orientation regardless of the inclination 
angle. We also include the percentage of the total SNR 
that comes from the subdominant harmonics (HH) . This 
result confirms that the parameter-estimation accuracy 
of the EOB model is robust to different orientations, and 
thus different strengths of the additional modes. 
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FIG. 3: Selected results from the q — 1, SNR — 12 run to exhibit the breakdown of Sp as a useful statistic. The left-hand 
panel shows the 2D marginalized posterior for the orientation angles i/i and cos t, with darker colors corresponding to higher 
probability density. The right-hand panel displays the maximum logarithmic likelihood as a function oi Dl, which is effectively 
uniform between ~ 0.5 and ^ 2.25 Gpc. 
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TABLE III: Fractional systematic errors and statistical errors 
for In A/ and InM when M = 120Mq, g = 6, SNR = 12 
and for three different inclinations: Edge-on (t = 7r/2), face- 
on (t = 0), and an intermediate orientation (t = 7r/3). We 
also include the percentage that the additional modes (HH) 
contribute to the total SNR. 



D. Simulating a detection 

All of the above results have been performed on simu- 
lated data that do not contain any noise, but do include 
the noise PSD in the inner product defined in Eq. (3). 
Thus the posteriors that we generate are not represen- 
tative of a probability density function for an actual 
GW measurement, but instead are the hypothetical aver- 
aged measurements of the same system in an ensemble of 
noise realizations [73-75]. To more realistically demon- 
strate the parameter-estimation capabilities of advanced 
ground-based interferometers, we want now to simulate 
a single LIGO/Virgo detection. 

To that end, we use again the binary configuration 
with q^2, M ^ 51Mq and network SNR of 12, but now 
add stationary, gaussian noise to the NR waveform using 
the same PSD as in the noise-free study. The resultant 
posteriors are then representative parameter-estimation 
products, subject to the following important caveats: 

• We use the same PSD for each detector when, 
in practice, each interferometer will have different 
sensitivity at any given time. Furthermore, the 
Virgo design sensitivity is not identical to LIGO 



(although it is qualitatively similar) . We also effec- 
tively introduce a noise "wall" at 30 Hz to account 
for the limited duration of the NR data. 

• We do not include any calibration errors in the 
waveform injections, which could prove to be a 
significant contribution to the overall parameter- 
estimation error budget [76]. Furthermore, we do 
not account for intrinsic error in the NR waveforms. 

• We recognize that simulated additive gaussian 
noise is different from injecting waveforms into real 
LIGO/Virgo noise [59]. 

For this study we find the 2D marginalized posterior 
distribution functions to be of the most interest. We show 
results for the sky location in Fig. 4 and mass parameters 
in Figs. 5 and 6. 

In Fig. 4, the sky-location posterior is shown in a Mol- 
weide projection with the detector locations projected on 
to the celestial sphere. The white, dotted lines show the 
circles of constant time delay between each pair of de- 
tectors. The posterior should sit at intersections of these 
lines, and the principal axis should lie along a line. A 
small white square is included, centered on the injected 
position. The injected values for the sky location are 
contained within the 63% confidence interval of the 
posterior (the red region of the error ellipse). The in- 
jected sky location was chosen to be a region where the 
SNR in each detector was roughly equivalent. 

Of more pertinence to this study are the mass pos- 
teriors. While InAf and InA^ are the most convenient 
parameters for the MCMC sampler, being the most or- 
thogonal, they are not of the most interest to the wider 
astrophysical community. A better data product would 
be posteriors on either the individual masses toi and 
TO2, or the total mass M and mass-ratio q. In post- 
processing we take the MCMC chains and compute the 
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FIG. 4: 2D marginalized posterior distribution function for the sky location of a g = 2, M = 50Mq, binary with SNR — 12 
injected into simulated stationary gaussian noise colored by the Advanced LIGO noise power spectral density. The red, yellow, 
and cyan regions roughly correspond to 1, 2, and 3a" confidence regions. The white box represents the injected sky location of 
this source. The position of each interferometer in the network is projected onto the sky, labeled with H (LIGO Hanford), L 
(LIGO Livingston), and V (Virgo). The white, dashed lines show the locations that yield the same time-delay between each 
pair of detectors for the injected sky position. 



relevant mass parameters at each step in the chain. The 
injected component black holes have masses mi = SAMq 
and 1712 = 17 Mq. Wc show the 2D marginalized poste- 
rior distribution functions for the the toi-TO2 (Fig. 5), 
and M-q plane (Fig. 6), where the color corresponds to 
the posterior density. 
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FIG. 6: Same as Fig. 5, but now depicting total mass M and 
mass-ratio q. 



FIG. 5: 2D marginalized posterior distribution function for 
the individual masses mi and m2 of a g = 2, AI = 51A/q, bi- 
nary with SNR — 12, injected into simulated stationary gaus- 
sian noise colored by the Advanced LIGO noise power spectral 
density. The injected values for {7711,7712} were {34, 17}Mq. 

These figures give a good depiction of just how cor- 
related the mass parameters are with one another, and 
how much of parameter space is supported by the chain 



in part due to that strong correlation. For example, the 
M-q plane has significant support for mass-ratios be- 
tween 1 and 3, compatible with previous LIGO MCMC 
studies using PN waveforms (e.g., see Ref. [51]). These 
are the type of parameter-estimation products that the 
astrophysics community can anticipate as the advanced 
detectors come on line in the coming years. 
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V. RESULTS FOR SPACE-BASED DETECTORS 

For BOB waveforms that include subdoniinant modes, 
we have found relatively small systematic errors in 
parameter-estimation results for ground-based observa- 
tions with SNR < 50. Because ground-based GW instru- 
ment rates are limited by sensitivity, higher-SNR events 
are exceedingly unlikely in the first generation of detec- 
tions. Proposed space-based instruments will be sensi- 
tive to supermassive black-hole (SMBH) mergers out to 
cosmological scales, such that a significant fraction of de- 
tected events may have SNR > 100. Space-based in- 
struments arc typically sensitive to these events over a 
broad bandwidth covering a large number of cycles lead- 
ing up to merger [4, 77]. Such observations will make 
much greater demands on the accuracy and efficiency of 
inspiral-merger-ringdown waveform templates. Though 
considerable effort has gone into estimating the ability 
of space-based instruments to measure astrophysical pa- 
rameters assuming accurate waveforms, very little has 
been done to assess the template requirements for these 
future observations. Here we make a limited exploration 
of this capability with current numerical-relativity and 
EOB waveforms. 

Of several proposed space-based GW interferometer 
instruments [3, 4, 78, 79], the best-studied concept is 
the classic LISA mission [3] . While acknowledging that 
there is currently considerable uncertainty about when 
and how the first space-based GW instrument will be de- 
veloped, we choose to study the classic LISA configura- 
tion to make contact with the large body of work that has 
already been dedicated to black-hole merger parameter 
estimation (e.g., see Refs. [31, 32, 80-82]). To compare 
with other concepts, the most relevant alteration from 
the classic LISA design is the arm length (e.g., from 5 
Gm for classic LISA down to 1 Gm for cLISA), which 
sets the overall scale for parameter-estimation capabil- 
ities; our results for a more modest detector configura- 
tion would be very similar to those for classic LISA, after 
appropriately rescaling the total mass of the black-hole 
system. 

We follow here the same procedure outlined for the 
LIGO /Virgo studies in Sec. IV, where NR waveforms are 
injected into simulated noise-free data, and the signals 
are analyzed using the EOB model as a template. Be- 
cause we saw significant bias in the EOB22 model at SNR 
^ 50 it is safe to assume that those errors will only grow 
with SNR, and so we focus these runs only on the EOBhh 
model. 

The duration of the available NR data restricts us to 
brief LISA observations, for which we can apply the static 
limit for the detector; thus we neglect EISA's orbital mo- 
tion during the observation time. Consistent with this, 
we focus on systems of mass 3 x 10'' Mq at the high end of 
EISA's sensitive range. For such observations the max- 
imum frequency attained by the merger signal is well 
below the transfer frequency of the detector (when the 
wavelength of the GW signal is comparable to the size 



of the detector). In this low- frequency, static regime, the 
instrument response is equivalent to two 60° Michelson 
interferometers, co- located, and misaligned by 7r/3 radi- 
ans. The antenna patterns for this configuration, and the 
discussion of the two limits applied here, can be found in 
Ref. [83]. 

We consider mass-ratios in the range 2 < g < 6, 
observed at SNR=100, which would make these unusu- 
ally distant for LISA observations. As shown in Fig. 7, 
the power-spectral density is similar over this mass-ratio 
range with more structure at g = 6. The noise- weighted 
waveforms for these cases arc most comparable to the 
largest-mass I2OM0 LIGO case that we studied. Space- 
based instruments are not expected to have a strong 
power-law slope like the seismic noise wall in the LIGO 
sensitivity curve, meaning that even for large masses 
there is a softer degradation of sensitivity going back 
to the early portions of the signal, making our fmm cut 
somewhat more artificial here. 

Generally the higher SNRs of our nominal LISA obser- 
vations would predict larger bias from systematic errors 
in the template waveforms. Because of the differences be- 
tween the sensitivity curves and response for LISA and 
LIGO, however, it is not straightforward to scale up such 
expectations. Table IV shows our results for the param- 
eter biases for mass M and chirp mass M. Unlike the 
LIGO results, the biases here are already statistically sig- 
nificant in most cases for LISA observations at SNR=100, 
reaching a few times the statistical error level. In Ta- 
bic IV we also provide the SNR of the residuals after 
the MAP waveforms are removed from the data. These 
residuals with SNR > 6 would be detectable and could 
therefore lead to biases in estimates of overlapping sig- 
nals. Residuals at this level would also limit the utility 
of the current waveform templates for studies aimed at 
testing general relativity [55, 56]. 

For LISA, however, such a system would have to be 
exceedingly distant, at redshift 2: > 20 or more, in order 
to expect an SNR as small as 100 [82]. Actual SNRs could 
be as much as 100 times larger, and we would expect 
correspondingly larger relative biases and residuals. Full 
interpretation of LISA data would thus require higher 
levels of template accuracy. Even the level of errors in 
the numerical simulations used here in place of the exact 
predictions of general relativity are far too large to avoid 
biasing such high-SNR measurements. 

That said, while the statistical error should decrease 
linearly with increasing SNR (causing 5(3 to grow), the 
systematic error should remain approximately the same, 
and the absolute biases in the mass parameters are small. 
Reading from Table IV, we see that the systematic errors 
in M and arc ^1%. Converting these into the indi- 
vidual masses, the biases on mi and m2 are at most a 
few percent. While the MAP mass parameters may end 
up many a away from the true values and would thus fall 
short of an optimal analysis of LISA data, such ^ 1% 
errors in the masses are still small in astrophysical terms 
and may have little impact on key inferences made about 
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FIG. 7: Same as Fig. 1 now showing tlie LISA examples using M = 3 x 10^ Mq at SNR = 100. The time-domain waveform 
in tlie left-liand panel is for the q = 6 case. 
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TABLE IV: Same as Table I but for SMBH mergers as seen by LISA. Here we also include the residual SNR after the MAP 
waveform is regressed from the data. 



the supermassive black-hole population. For example, 
when trying to constrain black-hole formation scenarios 
using simulated cLISA BBH catalogs, Amaro-Seoanc et 
al. [4] (based on the procedure in Ref. [54] ) endured sta- 
tistical errors as high as ^ 1% and were still able to easily 
discriminate between black hole seed models. 

The limitations of our current analysis prevent us from 
providing any detailed indication of how far template ac- 
curacy must be improved for space-based observations. 
For the detector configuration used here, the extrinsic 
parameter estimation at SNR = 100 is actually worse 
than that for the ground-based detectors. Space-based 
detectors rely on the long duration of the signals, the am- 
plitude and Doppler modulations caused by the orbital 
motion, and finite-arm-length effects, to break degenera- 
cies among the system parameters. We are limited here 
to working in a regime where all of those features are 
missing from our instrument model, so our classic LISA 
response is more like a two-detector co-located LIGO net- 
work operating at significantly lower frequencies than the 
existing ground-based detectors. 

More complete analysis will require either dramatically 
longer-duration numerical simulations or a suitably well- 
controlled way of matching analytical and NR waveforms 
at low frequency that does not add as much systematic 
error as the template models being tested. We would also 
require more computationally efficient signal generation 



to successfully study template biases with long-duration 
waveforms. 



VI. CONCLUSIONS AND DISCUSSION 

In this paper, we have produced the first measurement 
of systematic errors introduced by using EOB templates 
to analyze NR waveforms. Our study's main focus was 
on stellar-mass BBHs observed by Advanced LIGO and 
Virgo. We also considered supermassive black- hole merg- 
ers detectable by space-based interferometers like LISA. 
We have injected NR waveforms into simulated data and 
have used an MCMC sampler to characterize the poste- 
rior distribution function for the astrophysical parame- 
ters. Our metric for assessing the size of the systematic 
error is to compare the offset between the injected and 
best-fit parameters to the statistical error characterized 
by the standard deviation of the ID marginalized poste- 
riors. 

For the stellar-mass systems we have investigated, 
when including the subdominant modes, we find system- 
atic biases consistently comparable to, or smaller than, 
the statistical errors for mass-ratios up to q = 6 and 
SNRs < 50. We have tested these waveforms in the 
most stringent way possible, simulating high-SNR events 
where the merger (the least reliable part of the wave- 
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form calculations) is peaking in the most sensitive band 
of the detector, and the higher- frequency modes con- 
tribute significantly to the overall signal power. For the 
g = 6 waveforms weighted by the detector PSD, the frac- 
tion of power contained in the subdominant modes is 
11% and 16% for the M = 56Mq and AI = I2OM0 
systems, respectively. We also tested low-mass sys- 
tems (M ~ 20 M0) to better represent likely Advanced 
LIGO/Virgo detections. 

In all of these examples, the bias introduced by the 
EOB waveform in Ref. [25] was at worst comparable to 
the statistical errors. For several of our examples cho- 
sen specifically to compare with that paper, we find that 
the EOB waveforms accurately recover binary param- 
eters at SNRs higher than were predicted there using 
the (deliberately) conservative accuracy requirements of 
Refs. [41, 42]. 

Matched-filtering analyses are most sensitive to the 
phase of the signal, and it is the phase of the waveform 
that is most infiuenced by different choices of the model. 
It is therefore predictable that the largest biases appear 
in the mass parameters. On the other hand, the extrinsic 
parameters have comparatively less impact on the shape 
of the signal — the distance comes in as an overall ampli- 
tude scaling, and the sky location is (for ground-based in- 
terferometers) predominantly determined through trian- 
gulation based on time delays between detectors. There- 
fore a model waveform used for parameter estimation 
has much more room for error if, for instance, the lo- 
cation of the binary is the primary interest (say, for op- 
tical counterpart searches) and the requirements on the 
phase-matching are not as severe. 

While the results here are undoubtedly positive, there 
is still work to be done in waveform modeling. We arc 
only testing the EOB waveforms over the last 30-40 GW 
cycles before merger and there is no guarantee that longer 
waveforms will not accumulate larger phase errors during 
the early portion of the inspiral. Furthermore, this study 
neglects significant aspects of the waveform structure re- 
lated to black-hole spins and orbital eccentricity. A sim- 
ilar study will need to be performed with long-duration, 
spinning systems once both the NR and EOB waveforms 
are prepared for that test. It will also be valuable to test 
the EOB waveforms over a broader class of NR simula- 
tions, including those that were not used to calibrate the 
template model. Moreover, we have assumed in this pa- 
per that the NR waveforms were exact but, as discussed 
in Sec. II, this is not the case. One possibility for taking 
the numerical error into account is to inject NR wave- 
forms computed at different resolutions and / or extracted 
at different radii and measure the EOB systematic biases 
in each case. The difference between these biases can pro- 



vide us with an estimate of the intrinsic error caused by 
the NR waveforms deviating from the exact solution in 
general relativity. Finally, we do not consider the effects 
of real detector noise or calibration errors in the data, 
both of which could prove to be a significant contribu- 
tion to the overall error budget [59, 76]. 

While EOB waveforms that include subdominant 
modes were found to have relatively small systematic er- 
rors in parameter-estimation results for ground-based ob- 
servations with SNR < 50, proposed space-based instru- 
ments are sensitive to SMBH mergers with SNR > 100. 
For these scenarios, we find statistically significant bi- 
ases in the mass parameters for mass-ratios in the range 
2 < q < 6 observed at SNR = 100, on the order of - 1% 
for the component masses of the system. However, as 
discussed in Sec. V, systematic errors introduced by the 
EOB templates arc small enough to still place strong con- 
straints on the population of supermassive black holes in 
the Universe. 

In the LISA examples, the residual power SNR is > 6, 
sufficient to compromise both parameter estimation of 
overlapping signals and studies aimed at testing general 
relativity. Space-based GW data analysis will require 
more accurate templates, grounded in even more accurate 
numerical simulations. 

A more complete analysis is needed, but will require ei- 
ther dramatically longer-duration numerical simulations 
or a suitably well-controlled way of matching analyti- 
cal and NR waveforms at low frequency. We would also 
require more computationally efficient signal generation 
to successfully study template biases with long-duration 
waveforms. 
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